AITopics | online combinatorial optimization

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

2604.25269

Genre: Research Report (0.69)

Industry:

Transportation > Infrastructure & Services (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)
Information Technology > Data Science > Data Mining > Big Data (0.34)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsDec-27-2025, 15:05:28 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

name change, online combinatorial optimization, stochastic decision, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsSep-30-2025, 08:20:56 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

name change, online combinatorial optimization, stochastic decision, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.77)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsFeb-9-2025, 19:42:27 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

algorithm, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Poland (0.04)
Europe > France (0.04)

Industry:

Transportation > Infrastructure & Services (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.51)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neural Information Processing SystemsMar-13-2024, 12:43:20 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times. We deliver regret bounds for our algorithms for the previously studied full information and (semi-)bandit settings, as well as a natural middle point between the two that we call the restricted information setting. A special consequence of our results is a significant improvement of the best known performance guarantees achieved by an efficient algorithm for the sleeping bandit problem with stochastic availability. Finally, we evaluate our algorithms empirically and show their improvement over the known approaches.

algorithm, learner, loss estimate, (14 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Poland (0.04)
Europe > France (0.04)

Industry:

Transportation > Infrastructure & Services (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.51)

Add feedback

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

Zhou, Runlong, He, Zelin, Tian, Yuandong, Wu, Yi, Du, Simon S.

arXiv.org Artificial IntelligenceNov-3-2023

Over the recent years, reinforcement learning (RL) starts to show promising results in tackling combinatorial optimization (CO) problems, in particular when coupled with curriculum learning to facilitate training. Despite emerging empirical evidence, theoretical study on why RL helps is still at its early stage. This paper presents the first systematic study on policy optimization methods for online CO problems. We show that online CO problems can be naturally formulated as latent Markov Decision Processes (LMDPs), and prove convergence bounds on natural policy gradient (NPG) for solving LMDPs. Furthermore, our theory explains the benefit of curriculum learning: it can find a strong sampling policy and reduce the distribution shift, a critical quantity that governs the convergence rate in our theorem. For a canonical online CO problem, the Best Choice Problem (BCP), we formally prove that distribution shift is reduced exponentially with curriculum learning even if the curriculum is a randomly generated BCP on a smaller scale. Our theory also shows we can simplify the curriculum learning scheme used in prior work from multi-step to single-step. Lastly, we provide extensive experiments on the Best Choice Problem, Online Knapsack, and AdWords to verify our findings.

online combinatorial optimization, optimization, policy optimization, (1 more...)

arXiv.org Artificial Intelligence

2202.05423

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.60)

Add feedback

Online combinatorial optimization with stochastic decision sets and adversarial losses

Neu, Gergely, Valko, Michal

Neural Information Processing SystemsFeb-14-2020, 10:43:35 GMT

Most work on sequential learning assumes a fixed set of actions that are available all the time. However, in practice, actions can consist of picking subsets of readings from sensors that may break from time to time, road segments that can be blocked or goods that are out of stock. In this paper we study learning algorithms that are able to deal with stochastic availability of such unreliable composite actions. We propose and analyze algorithms based on the Follow-The-Perturbed-Leader prediction method for several learning settings differing in the feedback provided to the learner. Our algorithms rely on a novel loss estimation technique that we call Counting Asleep Times.

algorithm, online combinatorial optimization, stochastic decision, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.40)

Add feedback

Hardness of Online Sleeping Combinatorial Optimization Problems

Kale, Satyen, Lee, Chansoo, Pal, David

Neural Information Processing SystemsDec-31-2016

We show that several online combinatorial optimization problems that admit efficient no-regret algorithms become computationally hard in the sleeping setting where a subset of actions becomes unavailable in each round. Specifically, we show that the sleeping versions of these problems are at least as hard as PAC learning DNF expressions, a long standing open problem. We show hardness for the sleeping versions of Online Shortest Paths, Online Minimum Spanning Tree, Online k-Subsets, Online k-Truncated Permutations, Online Minimum Cut, and Online Bipartite Matching. The hardness result for the sleeping version of the Online Shortest Paths problem resolves an open problem presented at COLT 2015 [Koolen et al., 2015].

algorithm, artificial intelligence, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.72)

Add feedback

Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

Neu, Gergely, Bartók, Gábor

arXiv.org Machine LearningAug-31-2016

We propose a sample-efficient alternative for importance weighting for situations where one only has sample access to the probability distribution that generates the observations. Our new method, called Geometric Resampling (GR), is described and analyzed in the context of online combinatorial optimization under semi-bandit feedback, where a learner sequentially selects its actions from a combinatorial decision set so as to minimize its cumulative loss. In particular, we show that the well-known Follow-the-Perturbed-Leader (FPL) prediction method coupled with Geometric Resampling yields the first computationally efficient reduction from offline to online optimization in this setting. We provide a thorough theoretical analysis for the resulting algorithm, showing that its performance is on par with previous, inefficient solutions. Our main contribution is showing that, despite the relatively large variance induced by the GR procedure, our performance guarantees hold with high probability rather than only in expectation. As a side result, we also improve the best known regret bounds for FPL in online combinatorial optimization with full feedback, closing the perceived performance gap between FPL and exponential weights in this setting.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1503.05087

Country:

Europe (1.00)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.56)
Information Technology > Data Science > Data Mining > Big Data (0.47)

Add feedback

First-order regret bounds for combinatorial semi-bandits

Neu, Gergely

arXiv.org Machine LearningJun-10-2015

We consider the problem of online combinatorial optimization under semi-bandit feedback, where a learner has to repeatedly pick actions from a combinatorial decision set in order to minimize the total losses associated with its decisions. After making each decision, the learner observes the losses associated with its action, but not other losses. For this problem, there are several learning algorithms that guarantee that the learner's expected regret grows as $\widetilde{O}(\sqrt{T})$ with the number of rounds $T$. In this paper, we propose an algorithm that improves this scaling to $\widetilde{O}(\sqrt{{L_T^*}})$, where $L_T^*$ is the total loss of the best action. Our algorithm is among the first to achieve such guarantees in a partial-feedback scheme, and the first one to do so in a combinatorial setting.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

1502.06354

Country: Europe (0.46)

Genre: Research Report (0.50)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.35)

Add feedback

Filters

Collaborating Authors

online combinatorial optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses

Online combinatorial optimization with stochastic decision sets and adversarial losses

Understanding Curriculum Learning in Policy Optimization for Online Combinatorial Optimization

Online combinatorial optimization with stochastic decision sets and adversarial losses

Hardness of Online Sleeping Combinatorial Optimization Problems

Importance weighting without importance weights: An efficient algorithm for combinatorial semi-bandits

First-order regret bounds for combinatorial semi-bandits